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Abstract. Combining data from several case-control genome- wide as- 
sociation (GWA) studies can yield greater efficiency for detecting asso- 
ciations of disease with single nucleotide polymorphisms (SNPs) than 
separate analyses of the component studies. We compared several pro- 
cedures to combine GWA study data both in terms of the power to 
detect a disease-associated SNP while controlling the genome-wide sig- 
nificance level, and in terms of the detection probability (DP). The 
DP is the probability that a particular disease-associated SNP will 
be among the T most promising SNPs selected on the basis of low 
p-values. We studied both fixed effects and random effects models in 
which associations varied across studies. In settings of practical rele- 
vance, meta-analytic approaches that focus on a single degree of free- 
dom had higher power and DP than global tests such as summing chi- 
square test-statistics across studies, Fisher's combination of p-values, 
and forming a combined list of the best SNPs from within each study. 

Key words and phrases: Whole genome scans, hypothesis testing, ran- 
dom effects, Wald test, multiple comparison. 



1. INTRODUCTION 

Case-control genome- wide association (GWA) stud- 
ies are used to detect associations of disease with 
genetic markers (alleles of single nucleotide poly- 
morphisms or SNPs) across the genome by compar- 
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ing individuals with disease (cases) to disease-free 
individuals (controls). A widely accepted approach 
for identifying and confirming an association is to 
conduct an initial discovery study to detect promis- 
ing SNPs and then to validate the associations in 
data from independent studies, as, for example, in 
Easton et al. (2007). Both power calculations (e.g., 
Skol et al., 2007) and calculations of the probabil- 
ity of detecting disease-associated SNPs (Gail et al., 
2008a) indicate that large numbers of cases and con- 
trols are needed for a successful discovery study if 
one is interested in common alleles with small odds 
ratios (e.g., odds ratio per allele = 1.2), such as 
have been found in GWA studies for breast (Eas- 
ton et al., 2007) and prostate (Yeager et al., 2007) 
cancer. A recent study of diabetes (Zeggini et al., 
2008) illustrated that combining data from several 
studies could improve discovery efforts, compared to 
the separate analyses of the component studies. In 
some diseases, such as thyroid cancer or amyotropic 
lateral sclerosis (ALS), it is not possible to accrue 
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large numbers of cases and controls in a single re- 
gion or study center; in this context, data will need 
to be combined for successful discovery. In this paper 
we compare several approaches to using data from 
several smaller GWA studies to discover promising 
disease-associated SNPs that require further valida- 
tion studies. 

We compare procedures to combine data from ge- 
nome-wide association studies both in terms of the 
power to detect a disease-associated SNP while con- 
trolling the experiment- wide (including genome- wide) 
significance level, and in terms of the detection prob- 
ability. The detection probability is the probabil- 
ity that a particular disease-associated SNP will be 
among the T most promising SNPs selected on the 
basis of low p- values (or high chi-square tests) . 

In Section 2 we describe models for disease asso- 
ciation, including a fixed effects model that assigns 
the same log-odds ratio to each disease SNP and a 
random effects model that allows this log-odds ratio 
to vary across studies. In Section 3 we review the 
concept of detection probability for a single GWA 
study and extend the concept for several procedures 
for combining data from S case-control studies. We 
also define and compute power for these procedures, 
while controlling the experiment- wide significance 
level (Section 4). Section 5 contains numerical re- 
sults to compare procedures with respect to detec- 
tion probability and power. Some conclusions are 
given in Section 6. 

2. DATA AND MODELS 

We assume that genotypes for N SNPs from the 
same genotyping platform are available for 
case-control studies s = 1, . . . ,S. In this paper we 
let N = 500,000. Study s includes n s cases and n s 
controls. Let Xi = 0, 1 or 2 be the number of minor 
alleles at locus i for i = 1, . . . , N, and let Y = 1 for 
diseased and for nondiseased subjects. Suppose 
SNPs 1, ...,M are associated with disease, while 
SNPs M + 1, . . . ,N are not, resulting in the model 
for disease 

M 

(1) logit{P s (Y = 1\X U . . .,X N )} = ti s + Y,P!Xi- 

i=l 

Thus, we assume that the log-odds ratios for the 
nondisease-associated SNPs are equal to zero. In 
numerical studies in Section 5, we assume that all 
disease-associated SNPs have the same log-odds ra- 
tio within a study, (3f = f3 s for i = 1, . . . , M and 



for s = 1, . . . ,S. We model variation of /3 s among 
studies in two ways. In the fixed effects model we 
set /3 s = fi for s = 1, . . . , S, as might happen if the 
cases and controls for the S studies were sampled 
from the same homogeneous population. Under a 
random effects model, the log-odds ratios for the 
disease related SNPs are independent normal vari- 
ables, f3 s ~ N(J3, r 2 ), s = 1, . . . , S. As tagging SNPs 
are typically only markers in linkage disequilibrium 
(LD) with the true causal disease SNPs, this model 
captures the impact of variation in LD patterns on 
(5 s across study populations. 

We have assumed that log-odds ratios are strictly 
zero for the N — M nondisease-associated SNPs. 
This "strong null hypothesis" is plausible because, 
if there is no nearby disease SNP, then no amount of 
LD among nearby SNPs can induce an association 
between a marker SNP and disease. 

3. METHODS TO COMPUTE DETECTION 
PROBABILITY FROM COMBINED STUDIES 

3.1 Review of Detection Probability for a Single 
Case-Control GWA Study 

In a single GWA study, if disease is rare and the 
SNP scores Xi are independent in the source popu- 
lation, 

(2) logit{P(Y = l\X i )} = fJ ,*+p i X i , 

i = l,...,N, 

in the case-control population (Gail et al., 2008a). In 
(2) /i* = fi + log{£(exp(£^/3 fc X fe )} + log(7ri/7r ), 
where tt\ is the proportion of cases in the source 
population that are in the case-control study, and 
7To is the analogous proportion for controls. E is the 
expectation operator. 

The null hypothesis of no association for the ith 
SNP, Hq : Pi = 0, can be tested using the Wald statis- 
tic for a trend in risk with the number of minor 
alleles, Wi = $f / var(/3j), where /3, denotes the max- 
imum likelihood estimate for model (2) and its vari- 
ance var(/3j) is computed under the retrospective 
sampling (Gail et al., 2008a). Alternatively, one could 
use the score test for trend (Armitage, 1955). Under 
the null hypotheses of no association, both the Wald 
and the score test have one degree of freedom chi- 
square (xi ) distributions. These tests correspond to 
additive (or codominant) genotype scores (Sasieni, 
1997) and yield the same value whether the major 
or minor allele is positively associated with disease 
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(Devlin and Roeder, 1999; Pfeiffer and Gail, 2003). 
Moreover, under the rare disease assumption, the Wi 
are independent, which facilitates the calculation of 
detection probability (Gail et al., 2008a). 

A particular SNP, for example, SNP k, is T-selected 
or simply selected if its associated Wald statistic 
(or p- value) is among the top T test statistic values 
(or T lowest p-values), that is, rank(FVfc) > N — T. 
The probability that a particular disease-associated 
SNP, for example, SNP i, is T-selected is the detec- 
tion probability (DP), that is, DP = P(rank(Wj) > 
N — T). The proportion positive (PP) is the fraction 
of selected SNPs that are true disease-associated 
SNPs. 

3.2 Combined List of SNPs 

Here, each of the S studies is analyzed separately. 
The Wald test statistics W?,j = 1,...,N, based on 
model (2) are ranked within study s, for s = 1, . . . , S, 
and in each study the top T/S SNPs are selected. 
We then create a "combined list" of the union of 
the sets of T / S SNPs selected from each study. We 
let T° be the number of distinct SNPs that are T/S 
selected in at least one of the S studies. T c is not 
a fixed number, but a random variable, with T/S < 
T c < T, depending on the amount of overlap among 
the top T/S SNPs from the S studies. 

As the S studies are independent, the probability 
that disease SNP i is T/S selected in k out of S 
studies is given by 

P(SNP i T/S-selected in k studies) 

=En^ri( 1 -^)' 

A k ZGA fe l£A k 

where DPf denotes the detection probability for the 
ith disease SNP in study s, that is, DPf = 
P(rank(Wj S ) > N — T/S), and the sum is over all 
S\/k\(S — k)\ ways of selecting the set of k indices, 
Ak , from the set {1, . . . , S}. DPf is computed either 
under a fixed effects or random effects model for 
the log-odds ratios of the disease-associated SNPs. 
If the studies are exchangeable and DPf = DPi for 
all s, P(SNP i T/5-selected in k studies) simplifies 
to a binomial probability and the expected number 
of studies that T/5-select the ith. disease SNP is 
S(DPi). 

The combined detection probability, namely, the 
probability that the ith disease SNP is T/S selected 
in at least one of the S studies, is 

S 

(3) DPi = l-Y[(l-DPi). 

8=1 



For special settings, analytic expressions for DP\ 
given in Gail et al. (2008a) can be used in (3) to ap- 
proximate DPi. When all the studies have the same 
sample size and when there is only a single disease- 
associated SNP, M = 1 , that has the same fixed log- 
odds ratio j3 in (2) for each individual study, 

(4) DP^l-[F Hl {xl^ T/SN )] s . 

In expression (4) \\ i-t/sn denotes the 1 — T/SN 
quantile of a central \\ distribution, and Fh 1 de- 
notes a noncentral chi-square distribution Xi($) with 
non-centrality 5 = (3 2 / 'erf, where o~\ is given in equa- 
tion (21) in the Appendix. 

The expected proportion of positive findings out 
of the T c SNPs is approximately 

\ T c J T 

because, as demonstrated in simulations (Section 
5.1), there is very little overlap among selected SNPs 
across studies and, therefore, T c is usually close to 
T. 

3.3 Pooled Individual Level Data 

We show in Section 3.4 that a meta-analytic ap- 
proach has equivalent efficiency to pooling individ- 
ual level data. Therefore, in numerical studies below 
we only use the meta-analytic approach. Nonethe- 
less, it is instructive to outline an analysis of indi- 
vidual level data from S studies with the following 
fixed effects model. 

We assume that the log-odds parameter, for 
disease SNP i is the same in all studies, leading to 

io g it(p si ) = iogit(p s (y = i|jr i )) 

— (M s -\- PiXi, s — l,...,S, 

where fj,* denotes the study-specific intercept that 
accommodates differences in disease prevalence and 
differences in sampling fractions among the different 
studies. The Wald statistic for the ith SNP is com- 
puted by first finding the estimate that maximizes 
the likelihood 

(6) L(Pi,n*, . . . , ^ = n n/'i; (i - vsj) 1 -^ . 

The information matrix to compute the variance of 
$i depends on the study specific intercepts fj, s . An 
expression for var(/3j) = cr^ is provided in equation 
(20) in the Appendix. The corresponding Wald test 
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statistic Wi = p 2 /<?si nas a central x\ distribution 
if Pi = and a noncentral Xi(^) distribution with 
5 = pf ja 2 Si otherwise. 

Selection of the top T SNPs is based on ranking 
the Wald statistics Wi, i = 1, . . . , N, computed from 
model (5). If M = l,n s = n, and Pf = Pi for s = 
1, . . . , 5, then, following Gail et al. (2008a), 

(7) DP^I-Fh^xI^^), 

where Fh 1 is a noncentral Xi($) distribution with 
noncentrality parameter 5 = p 2 / cr^ . 

3.4 Meta-Analytic Approaches 

We first estimate study-specific log-odds ratios Pf 
for the ith SNP, i = l,...,N, by fitting model (2) 
separately to each SNP for each study and then com- 
bine study specific maximum likelihood estimates Pf 
to obtain an overall estimate of disease association 
for the ith SNP. This can be done using a fixed ef- 
fects model (Mantel and Haenszel, 1959; Yusuf et 
al., 1985) or a random effects model (DerSimonian 
and Laird, 1986) for disease SNPs. 

For the fixed effects model, the combined SNP 
specific estimate is 



(8) 



S 

p[=j2m 

s=l 



where wf = (1/a? )(ELi V^)" 1 - Under the null 
hypothesis of no association, pf has an asymptotic 
normal distribution with mean zero and variance 
var (Pf) = (ELi 1 / !;)" 1 - As shown in the 
Appendix, var (pf) = u^, the variance of the 
maximum- likelihood estimate based on model (5). 
Thus, the two approaches are equally efficient un- 
der the fixed effects model and in Section 5 we only 
study the meta-analytic approach. 

Under a random effects model (DerSimonian and 
Laird, 1986), estimates pf are assumed to follow 
a linear model, pf = Pf + ef, where Pf is a nor- 
mal variate with mean Pi and variance t 2 , the ef 
are normally distributed with mean zero and vari- 
ance af s , and Pf and ef are independent. Thus, un- 
der the random effects model var(/3|) = a 2 s + t 2 . 
Note that this model is equivalent to the random 
effects model for disease SNPs in Section 2 and that 
E(Pf) 2 = p 2 + af s + t 2 , which can be large even when 
P = 0. The strong null hypothesis for nondisease- 
associated SNPs, however, corresponds to a fixed 
effects model with Pi = or, equivalently, to a degen- 
erate random effects model with Pf = and rf = 0. 



Replacing the af s by their estimates reported in the 
individual studies, we have (DerSimonian and Laird, 
1986) 



max< 



j2 sUls (Pf-P?y-(s-i 



Es u i s l Es 



where Ui s = 1/<t? s and Pf is given by (8). The ran- 
dom effects meta-analytic estimate of the associa- 
tion of the ith SNP with disease is then given by 



s 



(9) 



/3f = £^ 



where v is = (f 2 + ^/{ELitf? + ^^i- The 
variance of pf is therefore approximated by var(pf) = 

viEti^+^r 1 }. 

In order for the between study variance t 2 to be 
reliably estimated, the number of studies S cannot 
be too small. For the fixed effects model, pf be- 
comes asymptotically normal as ns increase. For the 
random effects model, pf becomes asymptotically 
normal as S increases. 

The detection probabilities are computed by rank- 
ing the Wald statistics Wf = (Pf) 2 /cr 2 Si , for the 
fixed effects meta-analytic approach, or Wf = 
(Pf) 2 / vax(pf) for the random effects meta-analytic 
approach. 

3.5 Sums of Test Statistics and Fisher 
Combination of p-Values 

Let Wf denote the Wald test statistics for SNP 
i in study s obtained from fitting (2) to the study- 
specific data. The combined test statistic is 



(10) 



Wi 



s 



s=l 



which, for the nondisease-associated SNPs, has a 
central x| distribution. For the disease-associated 
SNPs, and conditional on Pf, Wi has a noncentral 
x|(<5) distribution with noncentrality parameter 5 = 

Ef=i(/^) 2 /<4- For M = 1 and Pi = P, the detec- 
tion probability is well approximated by (7). For this 

special case 5 = P 2 Sjo\, where o\ is specified in the 

Appendix formula (21). 

Instead of combining the Wald statistics, one can 

combine p-values pf across studies, through pf = 

nf=iP| (Fisher, 1932), and rank SNPs based on pf. 

Under the null hypothesis, —2 logpf = —2 ^4=1 l°gP| 



ON COMBINING GENOME- WIDE ASSOCIATION STUDIES 



5 



has a central x\s distribution. Numerous other com- 
binations of p-values have been proposed and stud- 
ied (Loughin, 2004). We therefore also assessed the 
performance of the Liptak-Stouffer combination of 
p- values, given by LS = J2i=i $_1 (1 -pfVVS, that 
has a normal distribution with mean zero and vari- 
ance one under the null hypothesis (Liptak, 1958). 

4. POWER OF VARIOUS APPROACHES TO 
COMBINING GWA STUDIES 

Except for the Fisher and Liptak-Stouffer meth- 
ods of combining p- values, we computed the sta- 
tistical power of the approaches to combining data 
presented in Sections 3.2-3.5 analytically based on 
asymptotic theory, and also tested analytical results 
in simulations. The power is the probability that the 
test statistic for a given SNP will fall into the prede- 
termined critical region that is chosen to control the 
significance level for multiple testing of the N geno- 
types and S studies. In contrast to the ranking pro- 
cedures for detection probabilities, the power for any 
particular SNP does not depend on the test statis- 
tic for any other SNP. We therefore usually omit the 
SNP index in what follows. The rejection region is 
chosen based on the strong null hypothesis that the 
log-odds ratios for the nondisease-associated SNPs 
are always equal to zero, regardless of the model that 
gives rise to the effects for the disease-associated 
SNPs. 

We set a = 0.05 /N = 10~ 7 to account for multi- 
ple testing. Further control of multiplicity for S is 
described below. 

4.1 Combine Lists of Significant SNPs from 
Each Study 



As in Section 3.2, we compute study-specific Wald 
statistics WJ, j = 1, . . . , N , s = 1, . . . , S, based on 
model (2). We determine significance based on 
whether W? exceeds the significance threshold 
Xi i_ Q , the 1 — a quantile of a x\ distribution. As we 
are combining results from S studies, we replace a 
by a/S to control the experimentwise error at 0.05. 
An exact calculation replaces a by a* = 1 — (1 — 
a) 1 / 5 , but for small a this a* is very nearly a/S. 

The power of the combined list approach under 
an alternative H\ is thus 



(11) 



P Hl (W s > x\ i-a/s m a * l eas t one study) 



1-11^(^^x5,1- 



When all the disease-associated SNPs for the differ- 
ent studies have the same fixed effect, (3 s = f3, Ph 1 is 
generated by a x?(<5) distribution with 6 = (3 2 /a 2 s , 
where af s is given in equation (21) in the Appendix. 
When all the studies have the same sample size, then 
(11) reduces to 1 — [Fh 1 {x\ i- a /s)F ' wn i cn 1S equiv- 
alent to (4) with T = aN. ' 

To obtain the power when the log-odds ratios of 
the disease-associated SNPs arise from the random 
effects model, /3 s ~ iV(/3, t 2 ), s = 1, . . . , S, we inte- 
grate (11) over the distribution of the independent 
study specific (3 s parameters to obtain 

P Hl (W s > xt i- a /s m a t l eas t one study) 



i-n/ p ^ ws 



<xL- 



■a/S 



;P s )dF(l3 s ), 



where F denotes the normal distribution with mean 
P and variance r 2 . 

4.2 Meta-Analytic Approaches 

Fixed effects meta-analytic approach Based on 
asymptotic normal theory, the power for the test 
statistic W F = (3 F ) 2 / var(/3 F ) is 



(12) 



(/^) 2 /var(/3 F ) 
PhAW F >xli- a )- 



Under the fixed effects model for the disease-associated 
SNPs, Pjjj is generated by a Xi(^) distribution with 
S = (£ F )74> where (3 F = £f =1 /W, w s = (1/ 

a s)(J2'k=i l/°fe) _1 an d a s i s gi ven m the Appendix 
equation (20). The power under the random effects 
model for disease-associated SNPs is obtained by in- 
tegrating equation (12) over the distribution of /3 s , 
namely, P Hl (W F > xl,i- a ) =J p x- Jps P Hl (W F > 
X i 1 _ a ;l3 1 ,...,P S )dF(P 1 )---dF(P s ). 



Random effects meta-analytic approach The use 
of asymptotic normal theory for the random effects 
meta-analytic approach when there are few studies 
is problematic, as the type I error rate can be sub- 
stantially inflated (Follmann and Proschan, 1999). 
Follmann and Proschan therefore suggest using a 
ts-i reference distribution rather than a standard 
normal distribution. Using the i-approximation, the 
power of the random effects meta-analytic approach 



is 



(13) 



P Hl (W R >F 1 ^l,l-a), 



a/S)- 



where Ph 1 is generated by a noncentral iq,s-i dis- 
tribution, with noncentrality parameter 5 = (f3 R ) 2 /a 2 , 
and Fi s-i Q is the 1 — a quantile of a central F\ s-i 
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distribution. However, under the strong null hypoth- 
esis that the log-odds ratio parameters for the 
nondisease-associated SNPs are strictly zero and do 
not vary across studies, one can replace the 
Fi,S-i,i—a cutoff value in (13) by Xi i- a > as f° r the 
fixed effects meta-analytic approach. In simulations 
we study the power for the random effects meta- 
analytic approach using both cutoff values for the 
test statistic. 

The power under the random effects model is ob- 
tained by integrating equation (13) or Ph 1 (W r > 
X\ i-a)i over t ne random effects distribution of the 
3 s , similar to the fixed effects meta-analytic ap- 
proach given above. 

4.3 Power of the Sum of Test Statistics 

The power for the test statistic W = ^ s =i W s is 
given by 

(14) P Hl (W > X 2 S,l- a ), 

where Ph 1 is generated by a Xs (^) distribution with 

<5 = Ef=i(/n 2 K 2 - 

We do not compute the power for Fisher's 
— 2^2 s=1 logp s or the Liptak-Stouffer combination 
of p-values analytically, because the distribution of 
the S p- values p\,...,ps cannot be obtained in a 
manageable form under the alternative. 

5. SIMULATIONS 

5.1 Simulation Methods to Estimate the 
Detection Probability, DP 

We used the methods in Gail et al. (2008a) for a 
single study to simulate data separately from each of 
the case-control studies, s = 1, . . . , S. At each SNP 
i = 1, 2, . . . ,N, we randomly and independently se- 
lected a minor allele frequency, rji , from the distribu- 
tion of minor allele frequencies in CGEMS 
(https : / / caintegrator . nci . nih . gov/ cgems/), as 
described in Gail et al. (2008a). In each replicate 
of the simulations described below, minor allele fre- 
quencies were re-assigned to each SNP in this way. 
We assumed that the iV genotypes were statisti- 
cally independent in the source population, the dis- 
ease is rare and the Hardy-Weinberg equilibrium 
holds at each locus. Given /3j, we sampled 0i from 
N((3i,af(f3i)) independently for each i = 1, . . . , N to 
generate realizations of the Wald statistics rapidly 
in GAUSS (Aptec Systems, 2005). The Wald statis- 
tics were computed as Wi = pf /af(Bi), which has 
the same asymptotic distribution as ftf /af(0i). 



For each disease model and parameter setting we 
generated NSIM = 1000 independent simulations. 
Under either the fixed or random effects disease model, 
and conditional on rji and s , we computed al = 
var(/3 s ) and then drew s from N(0 s ,a1). The study- 
specific estimates were then used in the procedures 
in Sections 3.2, 3.4 and 3.5 to compute DP. 

Define I(m, ISIM , T) = 1 if the rank of the corre- 
sponding test statistic falls into the top T ranks of 
the N ranked values of the test statistics in simula- 
tion ISIM , and otherwise. The detection proba- 
bility for each approach is then estimated by 

NSIM M 

DP = NSIM^M- 1 J ( m ' ISIM > T )' 

ISIM=lrn=l 

PP was estimated from P~P = (DP)M/T. For the 
combining lists approach, we modified these formu- 
las to take into account variation in T c . Letting 
I(m,ISIM,T/S) = 1 if the disease SNP is 
T I ^-selected in any study in simulation ISIM and 
otherwise, we estimated DP as above with I(m, ISIM , 
T/S) in place of I(m, ISIM ,T), and we estimated 
PP from PP = NSIM' 1 ^isiM^mHrn, ISIM, 
T/S)/T C (ISIM), where T C (ISIM) is the cardinality 
of the union of the S T/5-selected sets of SNPs. 

5.2 Simulations to Estimate Power 

We estimated power by simulations for each of 
the procedures in Section 4. We fixed the allele fre- 
quency for the disease-associated SNP at 77 = 0.2673, 
the mean allele frequency used in the DP calcula- 
tions. Estimates were otherwise obtained as in 
Section 5.1, but for a single locus. 

We used NSIM = 100,000 replicates of outcome 
data and for each replicate, each of the test statistics 
was calculated, and the true power estimated as the 
proportion of replicates which were significant at the 
experimentwise level a = 10 -7 . 

5.3 Simulation Results for Detection Probability 

We evaluated the DP for T = 20, 100, 1000, 10,000 
and 25,000, which, when divided by N, corresponds 
to respective selection fractions 0.00004, 0.0001, 
0.0005, 0.02 and 0.05. We studied M = 1 and M = 
10 disease SNPs, and let S = 5 with n s = 400 cases 
and controls and S = 10 with n s = 200 cases and 
controls for both the fixed and the random effects 
models for B, and we focused on Q = log(1.3). To 
assess the impact of varying study sizes, with S = 5, 
we let ii\ = 1000 and n s = 250, s = 2, . . . , 5. 
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Table 1 

Detection Probability (DP) and Proportion Positive (PP) in percent for five methods of combining data from S studies with 



n s cases ana 


l n 3 controls for fixed 


effects models with (3 = 


log(1.3), N = . 


500,000 SNPs, and random 


allele frequency r\ 


Method 


T = 20 


T = 100 


T = 1000 


T = 10,000 


T = 25,000 


np 


PP 


DP PP 


DP 


PP 


np 


pp 


np 


PP 








S = 5,n s =400, M 


— 1 true disease SNP 










Comb list 


7.20 


0.36 


15.70 0.16 


38.10 


0.04 


73.80 


0.01 


85.30 


0.003 


Ave T c 


20.0 




100.0 


999.0 




9919, 


5 


24504 


.0 


Meta fixed 


74.20 


3.71 


81.50 0.82 


91.00 


0.09 


96.80 


0.01 


98.20 


0.003 


Meta random 


74.20 


3.71 


81.50 0.82 


91.00 


0.09 


96.80 


0.01 


98.20 


0.003 




53.90 


2.70 


04. (U U.DO 


79.20 


0.08 


90.30 


0.01 


93.90 


0.004 




58.40 


2.92 


OD.9U U.Di 


80.20 


0.08 


90.80 


0.01 


94.50 


0.004 








S = 5, n s = 400, M = 


= 10 true disease SNPs 










Comb list 


7.75 


3.89 


16.95 21.70 


41.61 


0.42 


74.42 


0.08 


85.46 


0.03 


Ave T c 


20.0 




99.8 


998.0 




9914, 


4 


24494 


,6 


Meta fixed 


73.15 


6.58 


82.45 8.25 


91.87 


0.92 


97.53 


0.10 


98.78 


0.040 


Meta random 


73.15 


36.58 


82.45 8.25 


91.87 


0.92 


97.53 


0.10 


98.78 


0.040 




53.12 


26.56 


OO. 16 O.DZ 


79.70 


0.80 


91.11 


0.09 


94.87 


0.038 




55.98 


27.99 


0l.4( O.IO 


81.27 


0.81 


91.68 


0.09 


95.34 


0.038 








S= 10, n s = 200,M 


= 1 true disease SNP 










Comb list 


1.10 


0.06 


2.50 0.04 


7.30 


0.02 


50.70 


0.01 


68.60 


0.003 


Ave T c 


20.0 




100.0 


999.1 




9910, 


7 


24445 


,0 


Meta fixed 


73.20 


3.66 


80.50 0.81 


90.80 


0.09 


96.40 


0.01 


98.30 


0.004 


Meta random 


73.20 


3.66 


80.50 0.81 


90.80 


0.09 


96.40 


0.01 


98.30 


0.004 


TI/ 


39.00 


1.95 


0U.4U U.oU 


68.60 


0.07 


83.80 


0.01 


88.90 


0.004 


O l*i ( m \ 


42.00 


2.10 


tro no O KQ 

Oo.UU U.OO 


69.70 


0.07 


84.30 


0.01 


89.60 


0.004 








S = 10,n s =200,M: 


= 10 true disease SNPs 










Comb list 


1.50 


0.75 


4.44 0.44 


17.20 


0.17 


49.48 


0.05 


67.64 


0.03 


Ave T c 


20.0 




100.0 


998.9 




9908, 


6 


24440 


,5 


Meta fixed 


73.04 


36.52 


82.63 8.26 


91.62 


0.92 


97.17 


0.10 


98.47 


0.04 


Meta random 


73.04 


36.52 


82.63 8.26 


91.62 


0.92 


97.17 


0.10 


98.47 


0.04 




38.53 


19.27 


51.37 5.14 


69.54 


0.70 


85.59 


0.09 


90.61 


0.04 


-2£Mp) 


41.76 


20.88 


54.28 5.43 


71.62 


0.72 


86.44 


0.09 


91.09 


0.04 






S = 5, 


ni = 1000, n s = 250, s = 


2, ... ,5,M = 1 


true disease SNP 








Comb list 


22.00 


1.10 


33.52 0.34 


56.34 


0.06 


80.17 


0.01 


88.71 


0.004 


Ave T c 


20.0 




100.0 


999.1 




9919, 


8 


24503 


,9 


Meta fixed 


74.85 


3.75 


82.83 0.83 


91.33 


0.09 


97.01 


0.01 


98.44 


0.004 


Meta random 


72.35 


3.62 


81.08 0.81 


90.35 


0.09 


96.54 


0.01 


98.05 


0.004 




54.34 


2.73 


65.03 0.65 


79.49 


0.08 


90.94 


0.01 


94.46 


0.004 


-2j:in(p) 


55.72 


2.79 


66.06 0.66 


80.02 


0.08 


91.20 


0.01 


94.66 


0.004 



For the fixed effects model (Table 1), the two meta- 
analytic approaches had the highest DP for all study 
designs, followed by Fisher's combination of p- values 
and then the sum of the Wald statistics. The "com- 
bined list approach" had the lowest DP of all ap- 
proaches. For example, for T = 20, DP for the list 
was only 7.2% for five studies with n s = 400 cases 
and 400 controls each, and a single true disease- 
associated SNP, M = 1, while DP was 53.9% and 
58.4% for the sum of Wald tests and the Fisher p- 
value combination respectively, and 74.2% for both 
meta-analytic approaches. In the same setting, for 



T = 25,000, DP for the combined list approach was 
85.3%, while it was 94% or higher for all other ap- 
proaches (Table 1). For S = 10 and n s = 200, the 
combined list approach had even smaller DP values, 
because each of the component studies had a very 
small DP. Similar patterns were observed for M = 
10. The number of disease-associated SNPs, M, did 
not strongly impact DP for any of the methods un- 
der the fixed effects model. For S = 5 and varying 
study sizes, rii = 1000 and n s = 250, s = 2, ... ,5, for 
M = 1, the performance of the combined list ap- 
proach was slightly better, with DP = 22.0% for 



8 



R. M. PFEIFFER, M. H. GAIL AND D. PEE 



Table 2 

Detection Probability (DP) and Proportion Positive (PP) for five methods for combining data from S studies, with n s cases 
and n s controls for the random effects model for /3 ~ iV(log(1.3), 0.05 2 ), with N = 500,000 SNPs, and random allele 

frequency n 



Method 


i 


1 = 20 


T = 100 


T = 1000 


T = 10,000 


T = 25,000 


DP 




PP 


DP 


PP 


DP PP 


DP 


PP 


DP 


PP 










S = 5,n s = 


-- 400, M 


= 1 true disease bNP 










(iOTnn li^t 


12.50 




0.63 


23.40 


0.23 


48 30 05 


77.60 


0.01 


88.50 


0.004 


Ave T c 




20.0 




100.0 




999.0 


9919 


3 


24503. 


.5 


Met a fixed 


73.80 




3.69 


82.30 


0.82 


91.40 0.09 


97.50 


0.01 


98.60 


0.004 


Meta random 


73.80 




3.69 


82.50 


0.83 


91.40 0.09 


97.50 


0.01 


98.60 


0.004 




55.70 




2.79 


67.10 


0.67 


80.60 0.08 


92.00 


0.01 


95.10 


0.004 


-2J2 s Hps) 


58.20 




2.91 


68.80 


0.69 


81.80 0.08 


92.40 


0.01 


95.30 


0.004 










S = 5, 71 s = 


400, M = 


= 10 true disease bJNPs 










VJUlllU Hot 


11.61 




5.86 


22.50 


2.26 


47 25 47 


76.83 


0.08 


86.36 


0.04 


Ave T c 




19.9 




99.7 




997.5 


9913 


,8 


24494 


3 


Meta fixed 


71.99 




36.00 


81.51 


8.15 


90.81 0.91 


97.04 


0.10 


98.45 


0.04 


Meta random 


71.96 




35.98 


81.50 


8.15 


90.74 0.91 


97.04 


0.10 


98.45 


0.04 




54.85 




27.43 


66.06 


6.61 


79.84 0.80 


91.14 


0.09 


94.55 


0.04 




57.39 




28.70 


67.91 


6.79 


81.15 0.81 


91.73 


0.09 


94.88 


0.04 










S= 10, n s = 


= 200, M 


-1 i T Ci ATT! 

= 1 true disease SNP 












2.00 




0.10 


4.70 


0.05 


1 8 90 02 


54.80 


0.06 


70.40 


0.003 


Ave T c 




20.0 




100.0 




999.0 


9910 


.5 


24444 


3 


Meta fixed 


74.10 




3.71 


82.30 


0.82 


92.00 0.09 


97.50 


0.01 


98.90 


0.004 


Meta random 


74.10 




3.71 


82.30 


0.82 


92.00 0.09 


97.50 


0.01 


98.90 


0.004 




42.00 




2.10 


52.80 


0.53 


69.20 0.07 


85.50 


0.01 


90.30 


0.004 




44.70 




2.24 


55.00 


0.55 


71.10 0.07 


86.40 


0.01 


91.30 


0.004 










S = 10, n s = 


200, M = 


= 10 true disease SNPs 










Comb list 


2.03 




1.02 


5.75 


0.58 


20.47 0.21 


54.12 


0.05 


70.32 


0.03 


Ave T c 




20.0 




100.0 




998.8 


9907 


,8 


24440 


,4 


Meta fixed 


72.24 




36.12 


81.74 


8.17 


91.57 0.92 


96.92 


0.10 


98.50 


0.04 


Meta random 


72.22 




36.11 


81.73 


8.17 


91.55 0.92 


96.89 


0.10 


98.50 


0.04 




41.81 




20.91 


54.18 


5.42 


70.95 0.71 


85.74 


0.09 


90.79 


0.04 


-2£>(p.) 


44.71 




22.36 


56.42 


5.64 


72.43 0.72 


86.48 


0.09 


91.17 


0.04 








S = 5, m 


= 1000, n 3 = 


250, s = 


2, . . . , 5, M = 1 true disease SNP 








Comb list 


25.70 




1.29 


38.20 


0.38 


57.90 0.06 


81.10 


0.01 


88.90 


0.004 


Ave T c 




20.0 




100.0 




999.1 


9919 


,7 


24503 


,8 


Meta fixed 


74.70 




3.74 


83.10 


0.83 


91.90 0.09 


97.20 


0.01 


98.60 


0.004 


Meta random 


72.30 




3.62 


81.80 


0.82 


91.10 0.09 


96.50 


0.01 


98.20 


0.004 




55.80 




2.79 


66.80 


0.67 


81.10 0.08 


91.40 


0.01 


94.70 


0.004 


-2E>(p.) 


56.80 




2.84 


67.90 


0.68 


82.30 0.08 


91.80 


0.01 


94.80 


0.004 



T = 20, because study s = 1 had a larger size and 
higher DP. 

The proportions positive (PP) were largest for 
small T and larger M. As T increased, DP increased 
but PP declined (Table 1). If the purpose of the 
study is to serve as an initial screen designed to 
capture disease SNPs but tolerate a large number 
of false positive results (i.e., very small PP), T = 
25,000 might be of interest. If the purpose is to se- 
lect a small number of promising SNPs for further 



study, data for T = 20 commend the meta-analytic 
approaches. For the settings we studied, the Liptak- 
Stouffer combination of p-values had a lower DP 
than Fisher's combination of p- values. For example, 
for S = 10 and n s = 400, with M = 1 true disease- 
associated SNP, the values of DP were 55.5%, 64.8%, 
76.2%, 86.9% and 91.2% for the Liptak-Stouffer com- 
bination for T = 20,100,1000,10,000 and 25,000, 
while the corresponding DP values of the Fisher 
combination were 58.4%, 66.9%, 80.2%, 90.8% and 
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Table 3 

Detection Probability (DP) and Proportion Positive (PP) for five methods for combining data from S studies, with n s cases 
and n s controls for the random effects model for fi ~ iV(log(1.3),0.5 2 ), with N = 500, 000 SNPs, and random allele 

frequency n 



Method 


T 


' = 20 


T = 100 


T = 1000 


T = 10,000 


T = 25,000 


DP 




PP 


DP 


PP 


DP 


PP 


DP 


PP 


DP 


PP 










S = 5,n s - 


= 400,M 


= 1 true disease SNP 










(iOTnn li^;t 


86.10 




4.54 


89.50 


0.91 


94.40 


0.09 


98.10 


0.01 


98.30 


0.004 


Ave T c 




19.1 




99.0 




997.9 




9918.8 




24503 


.6 


Meta fixed 


57.90 




2.90 


62.50 


0.63 


70.20 


0.07 


77.70 


0.01 


81.50 


0.003 


Meta random 


55.90 




2.80 


60.20 


0.60 


66.20 


0.07 


75.80 


0.01 


80.00 


0.003 




93.20 




4.66 


94.80 


0.95 


97.00 


0.10 


98.20 


0.01 


98.70 


0.004 


-2J2 a HPs) 


93.40 




4.67 


94.70 


0.95 


97.20 


0.10 


98.20 


0.01 


98.80 


0.004 










S = 5, n s = 


400, M = 


= 10 true disease SNPs 












56.74 




36.62 


89.29 


10.12 


94.53 


0.96 


97.46 


0.10 


98.21 


0.040 


Ave T c 




16.6 




89.9 




985.8 




9903.5 




24486 


.0 


Meta fixed 


58.35 




29.18 


63.70 


6.37 


70.85 


0.71 


78.73 


0.08 


81.94 


0.033 


Meta random 


55.17 




27.59 


60.33 


6.03 


67.74 


0.68 


75.93 


0.08 


79.70 


0.032 




92.46 




46.23 


94.79 


9.48 


96.77 


0.97 


98.32 


0.10 


98.92 


0.040 


-2J2sHPs) 


92.36 




46.18 


94.68 


9.47 


96.73 


0.97 


98.36 


0.10 


98.87 


0.040 










S= 10, n s ■ 


= 200, M 


= 1 true disease SNP 










r^ntnl""! licit 


84.70 




4.52 


89.30 


0.91 


94.70 


0.10 


97.90 


0.01 


98.60 


0.004 


Ave T c 




19.0 




98.7 




997.1 




9907.8 




24441 


.4 


Meta fixed 


67.30 




3.37 


72.00 


0.72 


78.80 


0.08 


85.60 


0.01 


88.30 


0.004 


Meta random 


60.60 




3.03 


66.30 


0.66 


73.00 


0.07 


82.00 


0.01 


85.20 


0.003 




96.20 




4.81 


96.90 


0.97 


98.50 


0.10 


99.40 


0.01 


99.70 


0.004 


-2J2sHPs) 


96.30 




4.82 


97.10 


0.97 


98.40 


0.10 


99.30 


0.01 


99.70 


0.004 










S = 10, n 3 = 


= 200,M = 


= 10 true disease SNPs 










Comb list 


45.52 




26.12 


86.79 


10.12 


94.60 


0.10 


98.00 


0.10 


99.00 


0.041 


Ave T c 




17.9 




87.7 




979.9 




9884.0 




24415 


,5 


Meta fixed 


65.10 




32.55 


70.74 


7.07 


77.86 


0.78 


84.85 


0.09 


88.03 


0.035 


Meta random 


59.61 




29.81 


65.21 


6.52 


73.13 


0.73 


80.93 


0.08 


84.29 


0.034 




95.45 




47.73 


96.97 


9.70 


98.35 


0.98 


99.26 


0.10 


99.59 


0.040 


-2E s ln(Ps) 


95.24 




47.62 


96.82 


9.68 


98.34 


0.98 


99.21 


0.10 


99.55 


0.040 








8 = 5, rii 


= 1000, n s = 


250, s = 


2,...,5,M = 


1 true disease SNP 








Comb list 


83.80 




4.37 


87.60 


0.88 


92.80 


0.09 


96.70 


0.01 


97.90 


0.004 


Ave T c 




19.3 




99.2 




998.1 




9918.5 




24501 


.9 


Meta fixed 


60.10 




3.01 


64.00 


0.64 


69.00 


0.07 


76.40 


0.01 


81.20 


0.003 


Meta random 


50.80 




2.54 


55.70 


0.56 


62.20 


0.06 


71.40 


0.01 


75.70 


0.003 


EsWs 


90.30 




4.52 


92.80 


0.92 


95.50 


0.10 


97.80 


0.01 


98.90 


0.004 


-2£>(p.) 


90.20 




4.51 


92.30 


0.92 


95.60 


0.10 


97.60 


0.01 


98.80 


0.004 



94.5%. Therefore, we did not tabulate results for 
the Liptak— Stouffer combination of p- values. 

For the random effects model (Table 2) with a rel- 
atively small between study standard deviation, r = 
0.05, and with /3 = log(1.3) for the disease-associated 
SNPs, the DP results were very similar to the fixed 
effects model. Again, the meta-analytic approaches 
had better DP than the combined list, sum of Wald 
tests, or Fisher p-value combinations. However, for 
the random effects model with a very large stan- 



dard deviation, r = 0.5 (Table 3), Fisher's combi- 
nation of p- values and the sum of the Wald statis- 
tics had much better DP than the meta-analytic ap- 
proaches, as the large variation among the /3 s for 
the disease-associated SNPs caused some of them 
to be negative, reducing the meta-analytic estimate 
of the overall effect (Table 3). For r = 0.5 the com- 
bined list approach also had higher DP than the two 
meta-analytic approaches. Even for T = 25,000, for 
S = 5 studies with 400 cases and 400 controls each, 
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and a single true disease-associated SNP, M = 1, 
DP was 81.5% and 80.0% for the fixed and ran- 
dom effects meta-analytic approaches, compared to 
98.3%, 98.7% and 98.8% for the combined list, the 
sum of Wald statistics and Fisher's combination of 
p- values (Table 3). For T = 20, DP for the com- 
bined list approach was considerably lower when 
the number of disease-associated SNPs was M = 10, 
because in each study the 10 disease SNPs com- 
pete against each other for only T/S = 4 top po- 
sitions. This competition is less pronounced in Ta- 
bles 1 and 2 because the magnitude of log-odds ra- 
tios for disease-associated SNPs does not reach the 
large values that sometimes occur in simulations in 
Table 3 with r = 0.5. Similar to the fixed effects 
setting, the Liptak-Stouffer combination of p- values 
had a lower DP than Fisher's combination of p- 
values and the sum of Wald tests for the random 
effects models with r = 0.05 and r = 0.5 and, there- 
fore, we did not tabulate these results. 

For fixed effects models (Table 1), studies with 
S = 5 and n s = 400 resulted in higher DP than 
studies with the same total number of subjects but 
S = 10 and n s = 200 for the combined list, the sum 
of Wald statistics and Fisher's combination of p- 
values, for both M = 1 and M = 10 disease SNPs; 
no such difference was seen for the meta-analytic 
approaches. Under the random effects model with 
r = 0.05 (Table 2), DP was higher for the combined 
list, sum of Wald statistics and Fisher's combination 
of p-values for S = 5 with n s = 400 than for S = 10 
with n s = 200. In this case the meta-analytic pro- 
cedures had comparable or slightly higher DP for 
S = 10, n s = 200. Under the random effects model 
with r = 0.5 (Table 3), all procedures except the 
combined lists had higher DP with S = 10, n s = 200. 

5.4 Simulation Results for Power 

Power estimates based on NSIM = 100,000 simu- 
lations are plotted against odds ratios (Figure 1) for 
5 = 5 with n s = 400 and for S = 10 with n s = 200 
under the fixed effects model. The odds ratio was 
assumed to be the same in all S studies. For all 
combinations of S and n s , the fixed effects meta- 
analytic approach had the largest power for all odds- 
ratios. It gave the exact same results as the random 
effects meta-analytic approach with the critical re- 
gion defined by the xf i- a quantile, leading to indis- 
tinguishable lines in Figure 1. Using the i*i,s_i,i- Q 
cutoff value for the random effects meta-analytic ap- 
proach resulted in extremely low power. Addition- 
ally, for the meta-analytic approaches, S = 5 with 



n s = 400 resulted in the exact same power as S = 
10 with n s = 200, as the total sample size was the 
same. The sum of Wald-test statistics and Fisher's 
p-value combination gave very similar results with 
80% power for odds ratios near 1.4 compared to 93% 
power for the meta-analytic approaches. The power 
of the combined list approach was noticeably lower, 
and reached 80% only for an odds ratio = 1.75. 
These empirical power estimates agreed well with 
the analytic power calculations (data not shown). 

For the random effects model for the disease-asso- 
ciated SNPs, (5 s ~ N{j3,T 2 ), with a small random 
effects standard deviation, r = 0.05, the estimated 
power of these procedures was very similar to their 
power under the fixed effects model (Figure 2). If 
the random effects standard deviation was r = 0.5, 
there was enough heterogeneity in association ef- 
fects across studies that the log odds were positive 
in some studies and negative in others, leading to 
a reduction in the meta-analytic summary estimate 
of association, and to substantial loss in power com- 
pared to all other procedures (Figure 3). For exam- 
ple, for S = 5 with n s = 400 (Figure 3), an expected 
log-odds ratio of log(1.6) was required to attain 80% 
power for the meta-analytic approach. On the other 
hand, the sum of Wald tests or Fishers combina- 
tion are invariant to sign changes of the effects, and 
had very high power. For example, even for mean 
log-odds ratio /3 = 0, the power of those two proce- 
dures was near 80% for S = 10 with n s = 200 and 
S = 5 with n s = 400. The combined list procedure 
also had much higher power than the meta-analytic 
approaches, for example, 82% for a mean log-odds 
ratio of log(1.4) for S = 5 with n s = 400. Again, for 
the fixed effects meta-analysis and the random ef- 
fects meta-analysis with the critical region defined 
by the x\ i- a quantile, the lines completely overlap 
and are indistinguishable in Figures 2 and 3. 

The power of the Liptak-Stouffer combination of 
p-values for all settings studied for the figures was 
very close to the power of the Fisher statistic and 
therefore is not presented. For example, for the fixed 
effects model presented in Figure 1, for an OR = 
1.5, with 200 cases and 200 controls for 10 studies, 
the power of the Fisher combination was 0.9581 and 
for the Liptak-Stouffer combination was 0.9535. For 
400 cases and 400 controls and 5 studies, the power 
for an OR = 1.4 was 0.8057 for Fisher's and 0.8167 
for the Liptak-Stouffer combination of p- values. 

Fewer studies with larger sample size (S = 5, n s = 
400) resulted in higher power than more studies with 
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Fig. 1. Power of various approaches for combing data from S = 10 GWAS studies with n s = 200 cases and n s = 200 controls 
each (blue lines) or S = 5 studies with n 3 — 400 cases and n s = 400 controls each (red lines) under the fixed effects model for 
disease-associated SNPs, with r\ = 0.2673. 



the same total number of subjects (S = 10 and n s = 
200) for all procedures (with the exception of the 
meta-analytic approaches, for which the power was 
the same) under the fixed effects model and under 
the random effects model with r = 0.05 (Figures 1 
and 2). When r = 0.5, however, the power of all 
approaches but the combined list was larger for S = 
10 studies with n s = 200 (Figure 3). 

6. DISCUSSION 

As is evident from the literature on detection prob- 
ability (Gail et al., 2008a, 2008b) and power calcula- 
tions (Skol et al., 2006, 2007), large sample sizes are 
needed to have a good chance to discover disease- 
associated SNPs with odds ratios commonly found 
in GWA studies. Because in many settings the avail- 
able studies are too small, there is a need to combine 
information from several studies. Our results indi- 
cate that the fixed effects meta-analysis has higher 



DP than other methods. Only when there is severe 
heterogeneity in association effects across studies 
such that the log odds is positive in some studies 
and negative in others can methods such as sum of 
Wald tests or Fishers combination of p-values have 
larger DP than the fixed effects and random effects 
meta-analytic approaches. 

Loughin (2004) found, in an extensive simulation 
study of the power of various quantile combinations 
methods for p-values, that Fisher's method had very 
good power compared to other transformation func- 
tions (including normal and logistic) when a mi- 
nority of the tests provided most of the evidence 
against the null hypothesis. When signal was dis- 
tributed equally over all p-values, the normal trans- 
formation proved to be somewhat more powerful 
than Fisher's approach. We therefore also assessed 
the performance of the Liptak-Stouffer combination 
of p-values. In our simulation studies, under both 
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Fig. 2. Power of various approaches for combing data from S = 10 G WAS studies with n s = 200 cases and n 3 = 200 controls 
each (blue lines) or S = 5 studies with n a — 400 cases and n s = 400 controls each (red lines) under the random effects model 
for disease-associated SNPs, p 3 ~ N(f3, 0.05 2 ), with rj = 0.2673. 



the fixed effects and the random effects model for 
the disease associated SNPs, Fisher's combination 
of p-values had higher DP than the Liptak-Stouffer 
combination of p-values, but had very similar power. 

Although differences in LD patterns across popu- 
lations can result in associations in opposite direc- 
tions, as illustrated by CDKN1AS31R, in the sup- 
plement to Zeggini et al. (2008), in most circum- 
stances the heterogeneity will not be sufficient to 
render the meta-analytic approaches less powerful 
than other approaches. The method of combining 
lists of promising SNPs from each of the component 
studies has the lowest DP in most circumstances, 
and especially when there are many small studies of 
comparable size. Our results for power give a similar 
ranking of procedures to combine information as for 
DP, despite the fact that these two criteria are far 
from equivalent (Gail et al., 2008b). 



We used the critical values from a one degree- 
of-freedom chi-square distribution in power calcu- 
lations for the random effects meta-analytic proce- 
dure discussed by DerSimonian and Laird (1986). 
Under the strong null hypothesis that the log odds 
is strictly zero, we conducted simulations and veri- 
fied that such critical values yielded proper size in 
simulations for a = 0.1 and a = 0.01. It is not certain 
that the size is nominal for a = 10 -7 , however, and 
therefore the power from the random effects meta- 
analytic approach may not be strictly comparable to 
that of the fixed effects meta-analysis. If in fact null 
SNPs satisfy only a weak null hypothesis, namely, 
that their log odds have mean zero but vary about 
this mean, then a critical value based on an F dis- 
tribution might be more appropriate (Follmann and 
Proschan, 1999). Using such a critical value reduces 
power to almost zero, however, as shown in Fig- 



ON COMBINING GENOME- WIDE ASSOCIATION STUDIES 



13 




Fig. 3. Power of various approaches for combing data from S = 10 G WAS studies with n s = 200 cases and n 3 = 200 controls 
each (blue lines) or S = 5 studies with n s — 400 cases and n s = 400 controls each (red lines) under the random effects model 
for disease-associated SNPs, /3 a ~N{f3,0.5 2 ) , with n = 0.2673. 



ures 1, 2 and 3. In Section 2 we argue that a strong 
null hypothesis is plausible. 

We assumed that the same platform was used to 
analyze the samples in each study and thus that 
data were available on the same set of SNPs in each 
study. Zeggini et al. (2008) used two algorithms that 
employed Hapmap data to impute missing SNPs in 
some studies. We also assumed that adequate qual- 
ity control procedures had been followed in all the 
studies and that there was proper control for pop- 
ulation stratification. Otherwise, the assumption of 
a strong null hypothesis for nondisease-associated 
SNPs would not hold. 

APPENDIX 

Variance Computation for Model (5) 

For ease of exposition we omit the SNP specific 
subscript, and denote (5) by p s x = 1 — q s x = P(Y = 



1\X = x;/i*,/3), for s = 1, ...,S. The maximum like- 
lihood estimate $ is found by solving the score equa- 
tions corresponding to the likelihood (6), 

(15) d/d^s logL = J2(Y sj - p s xj ) = 0, 

j 

s = l,...,S, 

(16) d/df3 log L = E x 'i ^ ~ P*j) = °> 

s i 

where the index j refers to the jth subject in study 
s. The first set of equations corresponds to the study 
specific intercept parameters, and the last equation 
corresponds to the common log-odds ratio parame- 
ter (3. The variance cr| = var(/3) = (I22 — hilyi I^) -1 
where In, I12 , 122 are submatrices of the information 
matrix I from the prospective likelihood: 

(I 11 ) ij = E(d 2 /d tM d^logL), 
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(hxjj = E(d 2 / d/j,j d/3 log L) , 

I 2 2 = -E{d 2 /d 2 p\ogL). 

The expectations of the second derivatives and cross- 
derivatives of the prospective log-likelihood are taken 
with respect to retrospective sampling distributions 
f s x = P S (X = x\Y = 1) and g s x = P S (X = x\Y = 0), 
for cases and controls respectively. 

As the studies are independent, In is a diagonal 
matrix with the expected second derivatives of the 
study specific intercept parameters on the diagonal. 
Thus, the information matrix reduces to 

(17) / 22 =^/22,s 

s 

2 

s x=0 

2 

(18) (J 21 ) s = (J 12 ) s = n s + g s x )xp x q s x , 

x=0 

2 

(19) (I u )ss = n s J2(fx+9l)p s x q x - 

x=0 

The variance for (3 is then given by 

a 2 s = var(/3) = (J 22 - hil^hiT 1 



(20) 



For 5 = 1 (20) reduces to the standard case-control 
variance, 



(21) 



of = (I22 - hi(hi) I12) ■ 



Variance Computation for the Fixed Effects 
Meta-Analytic Approach 

Recall that /3 F = X] s=1 /0 s w S) where w s = 1/a 2 ■ 
(Eti ^d, thus, var(F) = (£f =1 1/a 2 )' 1 . 

Using (17) for a single study, 

a 2 = var(/3s) = (I 22 , s - ^l^Al^is^) -1 , 

where I s stands for the study specific Fisher infor- 
mation matrix. Therefore, 



(22) Yl = E( J 22^ - ^i^/ri^,, 



and, thus, vai0 F ) = (X) s =i l/ 17 ?) 1 equals equa- 
tion (20). 
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